Fusion of medical images using Nabla operator; Objective evaluations and step-by-step statistical comparisons

Since vectors include direction and magnitude, they have more information than scalars. So, converting the scalar images into the vector field leads achieving much information about the images that have been hidden in the spatial domain. In this paper, the proposed method fuses images after transforming the scalar field of images to a vector one. To transform the field, it uses Nabla operator. After that, the inverse transform is implemented to reconstruct the fused medical image. To show the performance of the proposed method and to evaluate it, different experiments and statistical comparisons were accomplished. Comparing the experimental results with the previous works, shows the effectiveness of the proposed method.


Introduction
Medical images, which have different useful medical information, are important tools in diagnosis. There are different types of medical images, such as magnetic resonance imaging (MRI), X-ray, magnetic resonance angiography (MRA), and CT images, which make complementary information [1]. The medical imaging method is formed based on radiation, visible-light imaging, microscopy, multimodal imaging, and image fusion with anatomical and physiological data simplifies diagnosis [2]. These multimodal medical images usually can provide enough information when be considered together. Hence, fusing them as a one result image is helpful for physicians to receive the entire required information in one medical image. The combination of significant information of some images into a single one is called image fusion. Since the resulted medical images can be analyzed at a higher level, it becomes one of the interested research area in recent years [3]. In recent years, medical Images have often been used by Computational Analysis, Reinforcement Learning and Deep Learning methods in the world of artificial intelligence [4][5][6][7][8]. The medical image fusion method employing supervised deep learning was examined to develop a new idea of image fusion [9]. A deep belief networks based on image fusion framework was proposed by applying several characteristic extraction techniques and selecting potential features to improve the image fusion [10].
Different benchmarks exist to classify image fusion methods. In the viewpoint of what domain is used, three types of image fusion are known: pixel, feature, and decision level [11].  [46,47]. This is because it represents the spatial structures efficiently. To use this transform, at first, discontinuities are found by Laplacian Pyramid. Then these discontinuities are linked into liner structures using filter banks. Moreover, Shearlet is another useful tool that scientist applied for image fusion in recent years [48,49]. Advantages such as efficiency in computation and the size of support over the contourlet made the shearlet more popular in multi-scale representation [50]. Edgepreserving filtering and heat diffusion are two multi-scale representation tools that are applied recently in image fusion [48,51]; e.g. combining Gaussian and bilateral filters [52]. However, selecting multiscale decomposition method and choosing proper fusion method in the transformed domain are difficulties in this class of medical image fusion methods [53].
In the second class of medical image fusion methods, the sparse representation based methods are used [54,55]. It computes some weighted sparse coefficients that render the image. Multimodal medical image fusion based on a hybrid approach of NSCT and DTCWT utilizes a convolutional network to generate a weight map that incorporates pixel movement information from dual or more multimodality medical pictures to provide highest-quality fused multimodal medical pictures [56]. Yang et al. [41] applied a sparse coding and dictionary learning to the image fusion problem. After that, different schemes of sparse coding and dictionary learning were represented. To increase the capability of multi-source signal protection and preserve edge/texture information, a joint sparse model was designed with coupled dictionary learning for multimodal medical image fusion [57]. For example, Yang and Li [58] designed a window based activity level measurement, Yin et al. [59] combined choose-max and weighted average based coefficients, Li et al. [60] and Zhu et al. [61] introduced substitution of sparse coefficients. Chen et al. [62] reflects the sharp edges using gradient sparsity in sparse coefficients. Different works have been done for Medical Images, some of which belong to Computational Analysis methods that use space transformation, including Mean Value Guided Contour, which is brought into the gradient image space and performs Segmentation operations [63]. Nejati et al. [64] used input images to learn a dictionary that reflects the structures. To learn better, Wang et al. [51] added spatial details dictionary and Kim et al. [65] learned different dictionaries based on structures of the input images. Zhang et al. [66] designed a multi focus image fusion based on sparse representation. Yet, finding sparse coding and dictionary learning are two main difficulties related to this class of medical image fusion [35].
In the third class of image fusion methods, the domain of input images is transformed in a way that the dimension reduces. For example, Laben et al. [67] used Gram Schmidt transform, Tu et al. [68] used intensity, hue and saturation, and Shahdoosti and Ghassemian [69] used principal component analysis. Mitianoudis and Stathaki [70] used pixel and region based fusion methods in the independent component analysis transformation domain. Rahmani et al. [71] used different bands to find proper weights that leads optimal intensity component. Choi et al. [72] used linear regression model by synthetic components. Sun et al. [61] transformed the image domain using gradient operator and then the Markov random field is used in the new domain to fuse the image. Like other gradient domain image reconstruction methods, the result is reconstructed by solving Poisson equation. A new color multi-focus image fusion based on the computation of structural gradient with the deep ResNet to enhance the detailed information of the image is proposed [73]. Kang et al. [74] decompose images into spectral foreground and background using matting model. Balasubramaniam and Ananthi [75] used the maximum and minimum operations on transferred input images into the fuzzy domain. Using fuzzy rules, a fuzzy neural information-processing block that transformed an input image into a fuzzy domain using fuzzy functions was proposed, and fuzzy hierarchical fusion attention neural network based on multiscale guided learning was designed [76]. Generating artifacts on the edges may be the most important drawback of these methods [29]. One way to overcome is to post process the weight maps. Zhang et al. [77] used KNN matting with the weights that calculated by local features. Liu et al. [78] combined local comparison and feature matting to update the decision map that is created by activity level of source images.
Plus the above strategies and performing domains, some methods combine different methods and transforms to use their advantages and overcome their drawbacks. For instance, Zhang and Hong [79] combined HIS and wavelet to use their advantages and overcome the color distortion. Daneshvar and Ghassemian [80] combined HIS and multiscale decomposition for medical image fusion. The aim was to overcome the weak points in HIS fusion method. Li and Yang [81] combined contourlet and wavelet. Wang et al. [82] combined sparse representation and non-subsampled contourlet transform. Jiang and Wang [83] decomposed input images by morphological component analysis and used sparse representation method as fusion method. Liu et al. [78] used sparse representation image fusion method by transforming source images to the multi-scale domain. Ghimpeţeanu et al. [84] used frame-based decomposition framework (MFDF) for image fusion. Moonon et al. [85] and Singh et al. [86] designed the nonsampled shearlet transform (NSST) for remote sensing and medical image fusion, respectively. Liu et al.
[53] combined MFDF and NSST as a new multi-modality medical image fusion method (MFDF-NSST). Various fusion of multimodal medical images by PCNN with QCSA and SSO optimization techniques are Studied to improve evaluation parameters [87].
In this paper, a novel medical image fusion method is proposed, which uses Nabla operator to fuse images in a transformed domain. The main contribution of this work is that the proposed method first converts the medical images into a vector field. Since each vector are compound of magnitude and direction, they replete with information that are inaccessible in scalars. Therefore, fusing images occurs in the new vector filed-resulted by Nabla operatorthe fused image then is achieved by the inverse transform.
The rest of the paper is ordered as follows. In Section 2, Nabla operator is reviewed. The proposed method is presented in Section 3. Section 4 shows Experimental results and the results are compared to the previous works. The Discussion and conclusion are presented in Section 5 and 6, respectively.

Ethics approval and consent to participate
The paper studies involving a free online Medical Image database and we state ethics and consent "Not applicable" in this paper.
The fusion of Medical Images using Nabla Operator includes the following steps: • Tansforming using Nabla operator • Fusing objects in the vector field

PLOS ONE
Objective evaluations of fused images and step-by-step statistical comparisons • Reconstructing by the inverse transform • Experimental results

Study design
As a whole, the proposed method fuses two or more objects as follows. At first, it transforms the images into a vector field using Nabla operator. In the second step, the vectors of the objects in the transformed images are fused using weighted averaging method. In the third and last step, the resulted fused object is reconstructed by the inverse transform process. The steps are shown in Fig 1.

Tansforming using Nabla operator
In Section 2, we show that the Nabla operator and the integral operator are complementary operators. Now, we introduce Nabla operator in discrete mode. Finite differences are acceptable approximation of derivatives in discrete cases. There are three types of finite differences: forward, backward, and central direction. Although the paper glances at the other two ones, because of its advantages, the proposed method uses the central one to transform the images using Nabla operator. In discrete, Nabla operator may transform a 2D function as follows: where h, k!0. The linear transformation conserves the additivity and homogeneity conditions. So, the conditions are also satisfied in central direction. By assuming h, k = 1, the above equations become:

PLOS ONE
By implementing the above equations for each pixel independently, the proposed method transforms the image to a vector field. In other words, each pixel is converted to a vector that its decompositions in the Cartesian coordinate system along thex andŷ directions are P c and Q c respectively.

Fusing objects in the vector field
After changing the filed from the scalar to the vector space using Nabla operator, the proposed method fuses them with the aim of combining their features into a single one [88] and integrating their complementary information, so that the fused medical image has additional clinical information that no longer accessible when they are seen individually. In this step the proposed method fuses each peer to peer vector of two or more transformed images by one of three basic models: Nabla-weighted model. To formulize this step, consider I i {i = 1,. . .,n} are the images that should be fused by the proposed method. In step one-namely in Section 3.1 -all the images are transformed to the vector field; call them D i . Each D i contains two individual values P i , Q i that together make the vector field. The proposed method fuse the peer to peer pixels of these values in a way that for each pixel located at (x,y) we have: where w i indicates the weight of image I i . Therefore, when one image has the higher weight, it contributes more in the fusion process. However, the weights should be positive or equal to zero, and since division by zero is not allowed there should be at least one positive weight. Nabla-max model. Assuming Ii, D i , P i , Q i as before, for each point (x,y) the proposed method fuse the source images as follows:

Reconstructing by the inverse transform
To reconstruct the fused image from � P and � Q, the proposed method arises the 1D heat equation, because of their similarities. Discretizing the continuous heat equation using finite differences is shown in Fig 2. In this figure, U(x(i), t(m)) is shown by U(i, m).
Moreover, 2-D Laplacian equation is: So, the last two equations become: By assuming b(i,j) = 4f(i,j) the above equation becomes: 4Uði; jÞ À Uði À 2; jÞ À Uði þ 2; jÞ À Uði; j À 2Þ À Uði; j þ 2Þ ¼ bði; jÞ Therefore, U(i,j) can be approximated by the above equation, as it can shown in Fig 3. In other words, the problem of finding such U is summarized to above equation. The Jacobi's Method is one of the solutions to this equation.

PLOS ONE
Objective evaluations of fused images and step-by-step statistical comparisons To use Jacobi's method to solve the above equation, it should be rewritten as follows: Then the Jacobi's method tries to converge to the U(i,j) by closing to it using iterations as follows: U i; j; m þ 1 ð Þ ¼ ðUði À 2; j; mÞ þ Uði þ 2; j; mÞ þ Uði; j À 2; mÞ þ Uði; j þ 2; mÞ þ bði; jÞÞ 4 ð15Þ where m indicates the iteration number. The algorithm starts from U(i,j,0) as its initialization. The information at least requires n steps to spread to all points, since in each iteration one neighbor point is propagated. However, for Jacobi's iterative model, the constant factor of decreasing error is calculated by [89]: after m iterations, the error is: to obtain an accuracy that the decreasing rate becomes α (0< α<1), m should be:

PLOS ONE
Objective evaluations of fused images and step-by-step statistical comparisons As a result, if n is large enough, at least m should be: As it can be concluded from the above equation, the minimum number of m depends on n, and N = n 2 is the size of the image. So, the method has a cost time of O(N 2 ); i.e. the m steps that each of them has cost of O(N).

Experimental results
Different MR and CT images have been served to make some Experiments; showing how the proposed method works and what advantages it has compared to the previous methods. The images are 256 by 256 and can be downloaded from the whole brain atlas website (Aanlib (http://www.med.harvard.edu/aanlib/home.html)). The aim in Experiments is fusing two images with complementary information to demonstrate patients with acute stroke, cerebral toxoplasmosis, vascular dementia and AIDS dementia.
As the first Experiment that is shown in Fig 4, two CT and MR images of a patient with acute stroke in the brain are fused with different methods. In Fig 4C-4H, red boxes show artifacts and information losses that are introduced using different methods. Fig 5 shows another Experiment that demonstrates fusing CT and MR images of a brain with acute stroke. As before, the fused regions with lack of information or detail distortion are illustrated with red boxes. In

PLOS ONE
Objective evaluations of fused images and step-by-step statistical comparisons are propagated along the images, which make artifacts in the fused images. These artifacts are obvious in the corners of fused images.
In General, Figs 4-11 demonstrate that the proposed method preserves more details especially in edges and overlapped boundaries, which make them better vision.
In Fig 12, the proposed method is implemented on different experiments with some complementary multimodal medical images. In these experiments CT, MR-T1, MR-T2, MR-PD, and MR-GAD images are used. The quantitative comparisons are done in the next section to statistically compare the proposed method with the previous ones.

Evaluations and comparisons
Traditionally, image fusion methods were compared to each other using subjective evaluations. In Figs 4-11, subjective evaluations are employed to show drawbacks of previously image fusion methods that employed on experiments 1-8. However, for an image, subjective evaluation may vary based on observer's background. Moreover, embedding it into the fusion algorithms seems impossible. But, because of the importance of evaluating image fusion methods using subjective tests, objective metrics were introduced as alternates that are consistent with human visual perception. However, even if all the objective evaluation metrics agree that the proposed fusion method has better results on the experiments than the previous ones, two questions arise about the differences between the proposed and the previous fusion methods: • What is the probability that these differences occur by chance?
• If the differences exist, how strong is it?

PLOS ONE
Objective evaluations of fused images and step-by-step statistical comparisons

PLOS ONE
Objective evaluations of fused images and step-by-step statistical comparisons Statistical comparisons express the probability that the differences exist and test their significance.
Therefore, in this section, at first, the proposed method is evaluated using different objective evaluation metrics. Secondly, results of objective evaluations are statistically compared to find if the proposed method is significantly different from the previous image fusion methods.

Objective evaluations
There are two classes of objective evaluation metrics for images. In the first class, the image is compared to a ground-truth image. So, quantitative comparisons can be implemented between

PLOS ONE
Objective evaluations of fused images and step-by-step statistical comparisons them. Some of the metrics in this class are: cross-correlation (CC), difference entropy (DE), mean absolute error (MAE), mutual information (MI), peak signal to noise ratio (PSNR), quality index (QI), root mean square error (RMSE), and structural similarity (SSIM) index [104].
The second class is evaluating the images in the absence of ground-truth. Since usually there is no ground-truth fused image to compare with, the fused image may either be evaluated by some known indexes or be compared to the source images using some non-reference metrics. Some known indexes are standard deviation (STD), Average Gradient (AG), Entropy, and Edge Intensity. On the other hand, some of the non-reference objective metrics that are used in this paper, are as follows: • Objective gradient based Image Fusion Performance Measure (Q AB/F ) [105]: it computes the fusion performance by calculating transferred edge information from the sources images into the fused image.
• MI [91,106]: as the most commonly used objective metric, it uses the mutual information to find how the salient features are dispersed during the fusion process.
• Piella's quality index (Q E ) [107]: its measure is based on image quality index and locally calculates the salient information that contained during the fusion.
• Zhao's phase congruency (Q p ) [104]: the measure is based on feature and is computed by phase congruency and its corresponding moments.
• Chen's quality metric [108]: it compares visual differences between the fused image and the source images in the sequence of: dividing images into different local regions, transforming to the frequency domain, weighting differences with a human contrast sensitivity function and computing the MSE of weighted differences. The quality metric is the weighted summation of the local regional images quality measures.
• Wang's performance evaluation [109]: at first, it computes mutual information between each pair of images (either fused or source images). Then the metric is computed using the correlation matrix and eigenvalues.
Besides visually showing the drawbacks of some previous methods in the last section, in this section some of above objective evaluation measures are used to compare the proposed methods with the rest methods. Different metrics have been mentioned for image fusion, one of which evaluates the previous methods in this field and the new methods are provided by An Objective Evaluation Metric, which is a better metric than the others [110].
In All indexes above are employed to quantitatively evaluate fusion methods on all Experiments. From the viewpoint of Mutual Information, DWT and FSD have the worst results while the Deweighted is the best fusing method. Structural Similarity finds that FSD Pyramid and Gradient Pyramid are very bad at fusing images while Nabla-weighted and Nabla-max are almost the best. Entropy considers FSD and GP as the worst methods and SIDWT and LP as the best methods. Qabf found that RP is the worst, while Nabla-weighted and Nabla-max are the best.
In general, almost all non-reference objective fusion metrics find the proposed methods as the best fusion methods. SSIM and MI, which compare the fused image with each source ones and average the results, agree with non-reference objective metrics.

Statistical comparisons
To report the significance difference between output variables, there are different statistical tests that have their own goals, advantages and limitations. To choose the correct statistical test for analyzing data, a user should at first consider some issues, i.e., what the outcome variable is, how many samples there are, the variables are correlated or independent, all the samples are drawn from a normal distributions or not, the values are scalar, ordinal or nominal, and if the goal is to compare groups how many they are.
In this case, the goal is to compare different fused method mentioned above. In the statistical test view, the outcome variables are the performances of the fusion methods. The nullhypothesis assumes that by ignoring the little differences between the results of fusion methods, they have same results. However, the goal of rejecting the null-hypothesis is to show that these differences are not random, and some of them are really different from the other ones. As they are measured across the same source images, the fusion methods are correlated. So, comparing performance between more than two correlated fusion methods leads us to the repeated-measures ANOVA (or within-subjects ANOVA) [111].
At first glance, it seems that ANOVA is appropriate for this work. However, there are some limitations associated with ANOVA that are conflict with our study; i.e. normality of samples and variance equality of variables. To test the normality of fusion methods, as the first limitation, we run some exploratory analyses on their values in 64 data computed before (8 evaluation criteria on above 8 experiments). The common methods to test if data is drawn from a normal distribution are Kolmogorov-Smirnov and Shapiro-Wilk tests. The more the sample size increases, the less important testing normality of the samples is. Statisticians believe that the Shapiro-Wilk test can suitably check the normality with sample size below 2000, especially for less than 50 samples. In this study with 64 samples, simply both Kolmogorov-Smirnov and Shapiro-Wilk tests were implemented, and as results the samples were significantly different from the normal distribution (p = 0.05, significance <0.001). There are two reasons that reduce the importance of normality test in this work. The first reason is that no factor tends to normalize the distribution of image fusion methods' accuracy over a set of images, so they are

PLOS ONE
Objective evaluations of fused images and step-by-step statistical comparisons almost always abnormal unless the sample size is very enough. The second reason is that there are different methods can normalize the data. So, the first limitation of ANOVA can be overlooked. However, the second limitation, test of variance equality, is a bit different. Although test for homogeneity of variances can be done using Levene's test, independence of image fusion methods and independence of data sets cause violation on this limit. The role of this limitation will be highlighted when the post-hoc tests are implemented. Therefore, ANOVA should be replaced by a suitable substitute, which has no violation with this work.
Friedman test. As an alternative non-parametric test for repeated measure ANOVA, the Friedman test [112,113] can detect the differences among multiple fusion methods. This model ranks the fusion methods' performance for each sample. Then the average rank of each fusion method is computed by averaging all the ranks that each fusion method obtained. The results show the average rank (namely mean rank) in scalar type. The higher differences between two fusion methods' mean rank, the more distinct they are. The mean rank of fusion methods in Experiments 1-4 are shown in Table 1.
To implement the Friedman test in this study, with N rows of samples and k columns of fusion methods, at first compute ranks within each row. Assume r ij is the rank of i-th element in the column j. Therefore, the test statistic is calculated by Q ¼ S S t = S S e in which: For large N, k (N>15, k>4) the chi-squared distribution can approximated the probability distribution Q, and then the p-value is Pðw 2 kÀ 1 � QÞ. The significant p-value leads post-hoc tests. Using Chi-square as distribution, the value of this statistic is:   [114], is like Friedman test and finds the similarity between raters. As the second alternative, Cochran's Q test [115], which is extended version of McNemar's test, is used for dichotomous data.

NSCT (2015) PCNN-NSCT (2008) NSCT-SR (2015) m-PCNN (2008) SCM-F (2013) SCM-M (2016) Nabla-PCA Nabla-weighted
In this study, The Friedman test and their alternatives just consider the significantly difference between intended fusion methods as a whole. However, if Friedman detect the methods are significantly different, there are some post-hoc tests for multiple pairwise comparisons on fusion methods.
The procedures of post-hoc after Friedman test, as a non-parametric one, are categorized in two classes. The first class tests use critical differences and then find the difference between fusion methods. If their difference is bigger than the critical difference, they are significantly different; otherwise, that post-hoc test is not powerful enough to find the significant difference between two fusion methods. In the second class, the differences between fusion methods are found by controlling family-wise error. It is done by decreasing the multiplicity problem effect.
Non-parametric Kruskal-Wallis test. Like Tukey test for ANOVA, the goal is to compare pairwise fusion methods when none of them is single-out. Designed by Siegel and Castellan [116], this test finds significance differences between fusion methods by computing a critical difference. One fusion method is significantly different from the other one if their Friedman's ranks difference is bigger than the critical difference of this test. Fig 13 shows differences between neighboring fusion methods' average ranks using Friedman test, which is obtained from Table 1. The critical difference for this post-hoc test is computed like the Nemenyi test [117] except in critical value, which makes it more powerful in finding significant differences. Its critical value is calculated by z table when the number of comparisons is corrected. The inequality for critical difference is: ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi where � R i is the Friedman rank of i-th fusion method, k is number of comparisons, z α/k(k−1) is critical value, and N is number of samples. In this work, α = 0.05 and k = 9. So, the critical value is computed by finding the corresponding value of z a=kðkþ1Þ ¼ z :05=9ð8Þ ¼ z :0007 from the standard normal probability  Fig 14 (wider gray  lines). For instance, jDel PCA À SCMMj ¼ j3:53 À 4:64j ¼ 1:11 < 2:064 while jDel PCA À mPCNNj ¼ j3:53 À 5:92j ¼ 3:39 > 2:064. So, this test finds Nabla-PCA is significantly different from m-PCNN and on the other hand has no powerful enough to find any significance difference between Nabla-PCA and SCM-M. This is shown in Fig 14 (column 3).
Bonferroni-Dunn post hoc test. Bonferroni-Dunn works like Non-parametric Kruskal-Wallis test with its own critical value (q α ). i.e.: ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi The critical values with k groups can be found at two-tailed Bonferroni-Dunn test tables. In this case for N = 36, k = 9 fusion methods q 0.05 is about 2.95 (between 2.931 and 2.988) (http:// www.stat.ufl.edu/~winner/sta4210/bonferroni.doc), and then CD ¼ 2:95 ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffiffi 9ð9 þ 1Þ=6ð36Þ p ¼ 1:9. Since the rest of this method is like Non-parametric Kruskal-Wallis, we only take the results of this post hoc test in Fig 14 (narrower black lines) into account. As shown in Fig 14, this test detects that Nabla-PCA is significantly different from the SCM-F, while the previous test was not powerful enough to detect this significance. Holm and Hochberg post hoc tests. The Holm-Bonferroni method [118], in short the Holm method, finds the differences between fusion methods from another perspective. Besides its simplicity, it is more powerful than the Bonferroni correction. The Holm method controls family-wise error rate by decreasing the effect of multiplicity problem that increases by considering several hypotheses (comparisons between fusion methods in this study). As a matter of fact, controlling the family-wise error rate means controlling the occurrence probability of Type I error (false positives), which is correlated with the number of comparisons between fusion methods. In this work, the goal is to find all fusion methods that are significantly different from the best ranked method (i.e. comparing Nabla-weighted with 8 fusion methods in maximum cases). Assuming we have k fusion methods, the Holm method sorts the k-1 rest fusion methods from the weakest to the most powerful, and name k-1 Hypotheses of insignificant difference between the sorted methods and the best one as H 1 , . . ., H (k-1) , respectively. Then it finds the p-values of their difference with the best method as follows: where R i is Friedman's average rank of method i from the Table 1, R k is average rank of the best method, and SE is standard error:

SE ¼
ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffiffi kðk þ 1Þ=6N For a significance level α, the Holm method finds the minimal index j (j = 1,. . .,k-1) in a way that The result is that the Holm method rejects all H 1 , . . .,H j-1 and do not reject H j ,. . .,H k-1 . In other words the j weakest methods are significantly different from the best one and the Holm method cannot decide about the significance difference between the rest methods and the best one.
To implement the Holm method in this work, we first sort the methods based on Friedman's results. The best one is Nabla-max and so we can compare it with 8 rest methods. To compute the p-values, we should first find standard error SE ¼ ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi 9ð10Þ=6ð36Þ p ¼ 0:646. Then p i can be computed from the Eq 45 to Eq 47 that are shown in Table 2. For example R NSCT À R Delmax SE ¼ 6:67À 3:08 0:646 ¼ 5:55. Then PðNSCTÞ ¼ 1 À zð5:55Þ þ zðÀ 5:55Þ. From the standard normal distribution table, for α = 0.05, z(5.55) = 1 and z(-5.55) = 1.4e-8. So, P(NSCT) = 1−1 +0 = 0. On the other hand for all comparing methods the value of α/(k-j) should be computed. For instance, NSCT has a value of a kÀ j ¼ 0:05 9À 1 ¼ 0:0063. Since the procedure is step-down, the Holm method first tests NSCT, which has the smallest p-value. Since P(NSCT) is less than   Table 2.
Alternative to the Holm method, the Hochberg one makes rejection of H 1 ,. . ., H j-1 by finding the maximal index j such that: The Hochberg method, as its step-up nature, first tests the Hypothesis with the largest pvalue, i.e. H(Nabla-weighted). Since P(Nabla-weighted) = 0.763> 0.05 = α/1, it cannot reject H Nabla max and continues to the next one. The next hypothesis H(Nabla-PCA) is NOT rejected since P(Nabla-PCA) = 0.49 < 0.025 = α/2. However, the next hypothesis is rejected, because P (SCM-M) = 0.0160 < 0.167 = α/3. So the Hochberg method rejects all hypotheses form H (SCM-M) to H(NSCT). The results are shown in Fig 15. Since the Hochberg compares the best method with the other ones starting from the most ranked fusion method, the Hochberg procedure is more powerful by construction. Fortunately, comparing fusion methods has no conflict with the limitations of Hochberg method. The independency between above defined hypotheses makes implementing Hochberg method without any assumption. Therefore, the Friedman and its post hoc tests express that there are significant differences between the proposed fusion methods and recent hybrid fusion methods. However, these statistical comparisons are employed to compare the proposed fusion methods with some other recent hybrid fusion methods in experiments 5-8 and some known fusion methods in all above 12 experiments. The Friedman results are detailed in Tables 3 and 4, where their Holm and Hochberg post hoc tests are shown in Figs 16 and 17.
In Table 3

PLOS ONE
Objective evaluations of fused images and step-by-step statistical comparisons

PLOS ONE
Objective evaluations of fused images and step-by-step statistical comparisons between proposed fusion methods and the some known ones. In both cases, the post hoc tests consider that the proposed fusion methods have significant differences with other fusion methods. In all statistical comparisons, there are significant differences between the proposed fusion methods and the previous ones, either known or recently hybrid fusion methods. All of them were not powerful enough to find significant difference between three proposed fusion methods, as it can be seen that Nabla-max was the best in experiments 1-4 and Nabla-weighted in experiments 5-8.

Discussion
To avoid the negative effect of similar experiments' repeatability, it is better to objective evaluations are employed on more experiments. Although many works in computer science compare the proposed method with other ones over a set of data, these comparisons are no longer acceptable in the statisticians' viewpoints due to two main reasons. The first one is that the proposed method may not be significantly different from some others, which means the difference may be merely random and so the null-hypothesis assumes that the methods perform the same. The second reason is that the datasets may not be sufficient for comparison-whether the sample size is low or they have dubious statistical foundations-and so they make unacceptable and exaggerated results. The statistical tests, on the other hand, show how the results are acceptable and how much the proposed method really differs from the other ones. Checking the guarantee of validity is made by employing proper statistical tests and if needed implementing their post-hoc tests [119]. Since the experiments were independence of each other, the normality of their distributions make no sense while the variances are inhomogeneity. To statistically compare image fusion methods over different experiments, the non-parametric tests are more suitable than the parametric ones. As a non-parametric statistical test, the Friedman test has no problem with above limitations and simply can be applied on objective evaluation measures. Moreover, it is stronger than others when comparing pairs of methods [119].
Different post-hoc tests for Friedman test were done for this work. Some of them were not enough powerful to find the significance differences between fusion methods. Wilcoxon signrank test is one of them, so its results no longer reported in this paper. Since the number of comparisons in this test made the significance level too small, the method wasn't enough powerful to detect most of significance difference between pairwise fusion methods. like Wilcoxon sign-rank test, the Nemeny post-hoc test wasn't enough powerful for both α = 0.05 and α = 0.1. Because of weakness of this post-hoc test, we just show its results in Fig 18. This figure shows a brief result of the post-hoc tests that are used after Friedman test. The horizontal lines are shown in three levels of grayness and wideness from wider and lighter gray to narrower and darker, which show the fusion methods that have significance difference with three proposed fusion methods from weak to strong. For example, In the viewpoint of Kruskal-Wallis (with α = .05), in experiments1-4, Nabla-PCA has significance difference with NSCT, PCNN-NSCT, NSCT-SR, m-PCNN and SCM-F, but it couldn't find any difference between Nabla-PCA and SCM-M method. As the same way, Nabla-Max and Nabla-weighted both are significantly different from NSCT to SCM-M fusion methods in this figure. As shown in this figure, different post-hoc methods have almost similar results with minor disagreements. However, the Holm and Hochberg post-hoc tests, as the most powerful ones, consider that all three proposed methods have significant difference with all previously fusion methods.
Plus the capability of the post-hoc tests to distinguish the significance differences, they are easy to calculate, intuitive, and visible in figures. The Holm and Hochberg post-hoc methods find the differences by controlling the family-wise error rate, which leads decreasing the occurrence probability of Type I error (false positives).
Cross-validation is not used in the computational method, but it is commonly used in applied machine learning to compare and select a model for a given predictive modeling problem because it is simple to implement, and results in expertise approximation that generally have a lower bias than other methods. This paper found that the fusion of medical Images is a promising that can be used in across all disciplines related to image processing.
We have defined and summarized a new operator and various evaluations, including statistical comparison for the fusion of Medical Images.
The field of multimodal fusion for deep learning in medical imaging is expanding. Future work should focus on Composition fusion of Medical Images using Nabla Operator and deep learning, including direct evaluation of different multimodal fusion approaches when applicable.

Conclusions
Since the multimodal medical images make complementary information, fusing them as a single image leads more accurate information that none of them have alone. The proposed method introduces a new medical image fusion method that is implemented in three steps. In the first step, the proposed method converts the source images into a vector field using Nabla operator. Converting scalar field to vector one leads having more information that had been hidden in the scalar one. In the second step, the proposed method fuses the peer to peer vectors of objects in source images using one of three fusion models: simple averaging, max, and weighted averaging. Since it done in the vector field, a ting reformulating for above models are occurred. In the last step, the proposed method reconstructs the fused image by the inverse transform process that is formulated in Section 2.3.
Different Experiments are implemented to show how the proposed method works. Moreover, different quantitative comparisons are applied to verify the advantages of the proposed method in comparison with some of the previously known fusion methods. They show that the proposed method had a much better performance than the previous works. However, the proposed method has some main advantages over the previous ones: i. Using a new vector field to fuse the source images makes the results independent of pixel intensities, the promising capability in complex geometries, and the ability to handle the fusion of corresponding pixels in source images that have contradictory values.
ii. It has less difference to the source images than the previous works.
iii. The statistical comparison tests believe that the proposed fusion methods have significant difference from the previous fusion methods; even recent hybrid ones.
iv. The fusion metrics objectively evaluated the proposed fusion method. The results show that it preserves more useful information than the others.
Figures of experiments and tables of measurements respectively compare the performance of proposed method in qualitative and quantitative manners with the previous fusion methods. All of them prove the excellence of the proposed method over the existing image fusion methods. Moreover, in section 5, this paper had a discussion to show how perfect the proposed vector field is for the aim of image fusion.